System for addressing inefficient query processing

ABSTRACT

A system for analyzing predicate usage and proposing solutions to improve query performance. The system comprises a system for collecting performance statistics for database queries; an aggregation system that organizes the performance statistics based on predicate combinations used in the database queries; and an evaluation system that compares the performances of queries having similar predicate combinations. Further features include a problem identification system that identifies a predicate, which when added to a query, results in a low performing query, and a solution proposal system that automatically proposes a solution to address the low performing query.

This continuation application claims priority to co-pending U.S. patentapplication Ser. No. 10/746,961 entitled SYSTEM AND METHOD FORADDRESSING INEFFICIENT QUERY PROCESSING, filed on Dec. 24, 2003, thecontents of which are hereby incorporated by reference in theirentirety.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates generally to query processing, and morespecifically relates to a system and method for identifying predicatescausing inefficient query processing, and proposing and implementingsolutions to improve query performance.

2. Related Art

In an environment such as SAP BW (or any environment using star schemaor complex SQL), determining the cause of slow query performance can bedifficult. For instance, individual queries must be identified foranalysis, and then the SQL must be captured, explained, and analyzed todetermine the possible reason for the performance problem.

When end-users use drill-down in queries, additional predicates areadded to the query SQL to support the grouping or selection in thedrill-down. Predicates may include any element of a search conditionthat expresses or implies a comparison operation. This means that a userdrill-down creates SQL with similar sets of predicates, which areusually based on a core set of required predicates packaged in thequery. The addition (or removal) of predicates can cause changes inquery performance. If, for example, a predicate which would filter wellis added to a query, but the predicate is not applied early in queryprocessing (due to missing catalog statistics, missing indexes,optimizer problems, etc.), then an automated method of finding thepredicate(s) associated with the slow performance can help to reduce theeffort necessary to solve the problem.

Unfortunately, there exist no automated query tools capable ofdetermining how predicate usage impacts query performance. Accordingly,a need exists for a system and method that can both analyze predicateusage and propose solutions to improve query performance.

SUMMARY OF THE INVENTION

The present invention addresses the above-mentioned problems, as well asothers, by providing a system and method for analyzing predicate usageand proposing solutions to improve query performance. In a first aspect,the invention provides a query predicate analysis system comprising: asystem for collecting performance statistics for database queries; anaggregation system that organizes the performance statistics based onpredicate combinations used in the database queries; and an evaluationsystem that compares the performances of queries having similarpredicate combinations. Further features include a problemidentification system that identifies a predicate, which when added to aquery, results in a low performing query; and a solution proposal systemthat automatically proposes a solution to address the low performingquery.

In a second aspect, the invention provides a method of analyzing queryperformance based on predicate usage, comprising: collecting performancestatistics for database queries; organizing the performance statisticsbased on predicate combinations of the database queries; and analyzingperformances of queries having similar predicate combinations.

In a third aspect, the invention provides a program product stored on arecordable medium for analyzing query performance based on predicateusage, comprising: means for collecting performance statistics fordatabase queries; means for organizing the performance statistics basedon predicate combinations of the database queries; and means foranalyzing the performances of queries having similar predicatecombinations.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features of this invention will be more readilyunderstood from the following detailed description of the variousaspects of the invention taken in conjunction with the accompanyingdrawings in which:

FIG. 1 depicts a query predicate analysis system in accordance with thepresent invention.

FIGS. 2-6 depict a set of tables related to an overview example.

FIGS. 7-15 depict a set of tables related to an SAP example.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to the drawings, FIG. 1 depicts a query predicate analysissystem 10 that operates in conjunction with a database system 11. Itshould be noted that system 10 could be integrated into the databasesystem 11, or be implemented as a separate stand-alone system. Querypredicate analysis system 10 examines query data 24 to improve queryperformance in database system 11. In particular, system 10 analyzesexecution data associated with queries (e.g., query processing rate,local predicates, etc.) and compares statistics for queries havingsimilar characteristics. More specifically, queries referencing the samefact table, and having similar predicates are analyzed and compared.System 10 then determines which predicate combinations are associatedwith a slowdown in query processing rates. That is, system 10 identifiespredicates which when added to a query cause the new query to run moreslowly than the original query. Potential problems can then beidentified, and solutions can be proposed. Thus, decisions regardingcandidates for changes (e.g., predicate columns, columns in tables,etc., which may need additional statistics or indexes) are made based onsymptoms of problems, as indicated by query performance statistics.

The metrics used by system 10 to determine good or bad queryperformances may include any commonly used performance statistics. Aunique feature of this process is that system 10 utilizes a data-miningprocess that correlates unique queries (that is, unique sets of localpredicates) with their performance characteristics, and then looks atthe performance of similar queries. If a slow query has predicates(e.g., ABC) that are a superset of a fast query (e.g., AB), this processexamines the difference between the two sets to find the cause of theperformance degradation, since the introduction of the new predicatemust be related to the cause of the performance degradation

It is understood that while the invention and exemplary embodiments aredescribed generally with reference to query processing in SAP BWrunning, e.g., on DB2/390, the invention could be extended to any DBsystem that maintains statistics similar to those described below.Namely, the present invention recognizes that an important indicator ofperformance problems (i.e., slow selection rate per second) is the sameon any platform. Accordingly, while the specific actions taken toaddress the identified problem might depend on the DB system being used,the general concepts described herein to identify the problem could beapplied to any DB system.

The process utilizes several commonly used performance relatedstatistics for queries, including:

(1) count(*)—rows that satisfy the predicates;

(2) query elapsed time;

(3) getpages—(GP or equivalent metric for DB data/index pagesreferenced); and

(4) rows processed—rows returned to the application after GROUP BY.

From the above, one derives two more commonly used metrics:

(5) query selection rate=count(*)/query elapsed time—this is the keyindicator of query performance; and

(6) pages per row selected=getpages/count(*).

(Note that these statistics are currently not all available via a singleinterface from DB2 or SAP. At this time, they are individuallyavailable, e.g., from catalog statistics, DB2 ifcid 318, SAP querystatistics, etc.).

Exemplary Overview

Consider the following example involving an exemplary star schema object(taken from the SAP BW star schema model) used in the queries below:

-   -   F fact table with columns DIMID1, DIMID2, DIMID3, f_column1,        f_column2    -   D1—dimension table with columns DIMID1, M1SID1, M2SID2    -   D2—dimension table with columns DIMID2, M3SID3, M4SID4    -   D3—dimension table with columns DIMID3, M5SID5, M6SID6    -   M1—master data table with columns MISID1, col1, col1a, . . .    -   M2—master data table with columns M2SID2, col2, col2a, . . .        Etc.    -   M6—master data table with columns M6SID6, col6, col6a, . . .

The DIMIDx columns join the Fact to Dimension tables, and the MxSIDxcolumns join master data to dimension tables.

In a first step, query data 24 is gathered for each query executed, andincludes information about performance and the local predicates in theSQL. For example, if the query is:

select f_column1 from F where F.DIMID1=D1.DIMID1 and F.DIMID2=D2.DIMID2and D1.M1SID1=M1.M1SID1

and D2.M3SID3=M3.M3SID3 and M1.col1a=‘findone’ and M3.col3a=‘findtwo’group by M1.col1, M3.col3,

then FIG. 2 depicts an example of the query data 24 saved for eachquery. Since the column list must contain the set of all columns thatappear in either local predicates, or in the GROUP BY clause, whencolumns appear in the GROUP BY, but not in the local predicates, ashere:

select f_column1 from F where F.DIMID1=D1.DIMID1 and F.DIMID2=D2.DIMID2and D1.M1SID1=M1.M1SID1

and D2.M3SID3=M3.M3SID3 and M1.col1a=‘findone’ and M3.col3a=‘findtwo’

group by M1.col1, M3.col3, M4.col4.

The result would be noted in the query performance statistics as shownin FIG. 3. These raw statistics are then aggregated by aggregationsystem 12, which lists unique sets of local predicates and statisticsfor each unique predicate combination, such as that shown in FIG. 4.Thus, as shown in FIG. 4, performance statistics are organized byaggregation system 12 based on predicate combinations (e.g., columns).

Next, predicate evaluation system 14 compares queries to determine theperformance differences when predicates are added to queries. In thisexample, statistics are gathered for queries that differ by onepredicate. That is, the predicates of one query are a proper subset ofthe predicates in another query, and the superset contains only oneadditional predicate. This comparison could be done with other variants,such as comparing queries that differ by up to N predicates, or forqueries that differ by only one added dimension. If there were moredifferences, then there would be more hypothetical fixes to be tested,as described below.

An ID number column as shown in FIG. 5-7 is added for referencepurposes. The other columns are taken or computed from the summarizedstatistics shown in FIG. 4. Two new values are calculated:

(a) Time, which is the sum of time for queries containing the addedpredicate; and

(b) Estimated improvement, which is an estimate for how much time couldbe saved for queries with the added predicate, if they ran at the samecount/second rate as the queries without the added predicate.

Estimated improvement=Time−(((count/sec with addedpredicate)/(count/second))*Time).

Since there is additional database (db) activity needed to process theadded predicate, the estimated improvement would likely be anoverstatement of the improvement, but it is a reasonable starting pointfor evaluating the impact.

As an alternative, statistics could be evaluated by grouping the queriesby sets of tables/columns in the predicates, without operators, such asthat shown in FIG. 6. If there were few queries, this could help to findpatterns in the added predicate more easily. The disadvantage is thatthere would be more variation in the queries grouped together.

Next, for each fact table depicted in FIG. 5, queries can be ordered by“estimated improvement.” Then, problem identification system 16 andsolution proposal system 18 can evaluate possible causes and changes foreach fact table starting with queries with the largest “estimatedimprovement.” Problem identification system 16 can utilize any logic toidentify the cause of a slowdown. For instance, if count/second has alarge decrease when a predicate is added to a query, this can point toan issue such as that the new predicate filters well, but is not appliedearly enough in the join sequence. This can be confirmed by checkingwhether the GP/count has a large increase. If there is a large increasein GP/count, it means that the database system 11 had to search morepages to find the qualifying rows. The query statistics in FIG. 5, IDnumber 1, are an example of this.

In one exemplary embodiment, problem identification system 16 may selectthe identified problem from a “problem list.” An exemplary problem listfor a slowdown in count/second may, e.g., include the following:

(1) missing or insufficient statistics on the added column (so databasesystem 11 cannot make the best optimizer choice);

(2) missing index on new column (again, database system 11 cannot makethe best optimizer choice);

(3) missing multi-column index on the fact table (in the case there aremultiple filtering predicates in different dimensions);

(4) predicate operator is not index matching (e.g., “status notdelivered,” rather than “status in (backordered, open)”); and

(5) there is correlation between the added predicate column and a columnin the base comparison set of predicates.

Next, solution proposal system 18 could propose possible solutions.Changes could be proposed for administrators, or using feedback outlinedbelow, could be automatically added and tested. Having found a predicate(or table.column) which when added slows the query, problemidentification system 16 and solution proposal system 18 can make ahypothesis as to the cause based on the possible problems listed above(or other possible reasons) and then implement a solution. For example,check for column statistics in DB catalog, and create them if they donot exist, or check for indexes containing the column, and add one if itdoes not exist, and so on.

A feedback system 20 can be utilized to assess the efficacy of animplemented solution. As feedback, after the proposed fix has beenimplemented, query statistics as shown in FIG. 4 for the period beforeand after the change can be compared. If the count/second rate improves,then the proposed fix would be retained. If the performance did notimprove, then the proposed fix would be removed, and another proposedfix could be implemented and evaluated, based on the list of possiblesolutions to the problem, as outlined above.

A regression testing system 22 could also be implemented to assess theoverall impact of an implemented solution. As a regression test, onewould need to evaluate the performance statistics of queries referencingthe changed element. If a new column index on a master data table werecreated, all queries referencing that column (found in the localpredicates in FIG. 4) would be examined. Likewise, if a new multi-columnfact table index were created, all queries using the dimensions in theindex (found in the dimensions column in FIG. 4) would be examined todetermine if they improved, or ran slower.

Exemplary SAP Implementation

SAP currently provides, via its RSA1 transaction, a method of proposingaggregates (summary tables) based on characteristics (i.e., predicates)used in queries. Characteristics are columns in dimension tables ormaster data tables of the star schema. Such summary tables can help toimprove performance of frequently executed queries that usecharacteristics matching the aggregates.

In an ad-hoc query environment, where an aggregate may not be suitablefor improving query performance, due to the wide variety ofcharacteristics used in the queries, adding new multi-column indexes onthe fact table can help to improve performance. A multicolumn fact-tableindex containing only index columns for the dimensions used in a querycan provide access to the fact table that is more efficient than accesswith the standard SAP index structure. The standard SAP index structureis made up of M single column indexes for the M dimensions of the facttable, and a single M column index containing columns for all the Mdimensions on the fact table.

This method is based on the observation made on SAP systems that whenthe rate of “rows selected per second” is slow, the cause may be that abad join order was used by DB2, such that the best filtering dimensionswere applied after the fact table is accessed. If the filteringdimensions are applied after the fact table is accessed, DB2 must readmany extraneous rows out of the fact table, which are filtered after thefact table is accessed, which reduces the “selected per second” rate.This method recognizes the symptom of a possible join order problem,proposes an index, and then checks the performance of the index todetermine whether the index helped and should be kept.

Since SAP query statistics are gathered based on characteristics, whichreside in dimensions or master data tables, and DB2 indexes are createdbased on dimensions, a method is needed to determine the dimensionswhere the characteristics reside, and to determine the sets ofdimensions used by the queries. Once the sets of dimensions used by slowqueries are found, a method is needed to order the index columns for thedimensions.

In the past, this problem was “solved” by examining the SQL for longrunning queries: explaining the statement, and determining thedimensions used. This method is based on sampling, and can be automated.

The present invention addresses this problem as follows. For each queryexecuted, characteristic columns are converted to dimensions todetermine a set of dimensions used by the query. Then, all queries inthe evaluation interval are summarized to evaluate the total DB time(and then weight) of queries using each unique set of dimensions. If twodimension combinations support queries with similar aggregate db time,but one dimension combination has a slower ‘count(*)/second’ rate, thedimension combination with the slower rate will be favored as an indexcreation candidate. Based on the weight of each dimension combination,new multi-column indexes on fact tables are proposed to optimize accessto the fact table.

The process is based on symptoms of access path problems (that is, low‘rows selected per second’ rate) rather than based on analysis ofpredicates and database statistics.

The process contains a feedback loop, where the benefit of new indexescan be tested, to determine if the problem symptom (slow select rate)was helped by the new index. Query statistics are aggregated before andafter creation of an index, and the ‘count(*)/second’ rate is comparedfor groups of queries using the same set of dimensions. If thecount(*)/second rate improves with a new index, then the new index hashelped performance.

The process contains a weighting system for created indexes, wheredifferent indexes can be evaluated, to determine which provides the mostbenefit to system performance. The performance benefit of a new index isused in conjunction with the query statistics to determine the ‘timesaved’ by the new index. Different indexes can be compared, to determinewhich has the largest ‘time saved’, and thus the largest impact onperformance.

(A) Proposal of New Multi-Column Fact Table Indexes

The first step (Step 1) is to extract information about query statisticsover a time interval, and build an array of statistics rows, where eachrow contains the columns:

-   a) STATUID—the unique query identifier in SAP-   b) Infocube specified at query execution (from RSDDSTAT.INFOCUBE)-   c) DB request time (from RSDDSTAT.QTIMEDB)-   d) Rows satisfying the predicates—that is, count(*)    (RSDDSTAT.QDBSEL)-   e) Rows after grouping (RSDDSTAT.QDBTRANS)-   f) characteristics (collect all values of RSDDSTATAGGRDEF.IOBJNM    where RSDDSTATAGGRDEF.AGGRST NE space)-   g) querycube used at execution time (RSDDSTATAGGRDEF.QUERYCUBE)-   h) dimensions (empty at this stage)

FIG. 7 depicts a sample from the first step containing four queries. Ina second step (step 2), the dimensions are concatenated and sorted asfollows:

for each row in the array built above:

for each unique characteristic (IOBJNM) in the row

select dimension from RSDDIMEIOBJ where RSDDIMEIOBJ.INFOCUBE=querycubeand RSDDIMEIOBM.IOBJNM=characteristic

concatenate the dimension to the current dimension list, if it is not amember of the list

endfor characteristic

sort the dimension list

endfor row

If an SAP data dictionary (table RSDDIMEIOBJ) contains:

dimension infocube iobjnm D1 Z A D1 Z B D2 Z C D2 Z D D3 Z E D4 Z FThen step 2 yields an array containing rows such as these, where thereis one row for each SAP query, querycube pair, as shown in FIG. 8.

The next step (step 3) is to merge (i.e., aggregate) all rows containingidentical dimension lists, summing db time, count(*), and rows aftergrouping, and creating execution count. “Weighted db time” is (dbtime/sqrt(count(*)/db time)), in order to give larger weight todimension combinations that return data slower. For example:

db time dimensions count(*) weighted db time 100 D1 D3 100000 3.16 60 D3D4 10000 4.64

So in this case, even though D3 D4 has a smaller db time, it has ahigher weighted db time, due to its slow ‘count(*)/db time’ ratio.

Other formulas could be used to create a “weighted db time,” dependingon how much one wants to favor slow dimension combinations. The goal ofthe formula is that if there are two dimension combinations with similardb time, that the combination with slow count(*)/dbtime will be favoredas a candidate for index creation. See for instance FIG. 9.

Next, proposed index column order can be assigned in the following way:

for each row in array (3) where the count of dimensions is more than one

recursively on the dimensions of the row starting with the dimensioncount (X) until the count of dimensions is 1

find the set of X−1 dimensions with the largest “weighted db time” whosedimensions are a proper subset of the set of X dimensions

assign position X to the dimension which is in the set of X dimensions,but not in the set of X−1 dimensions with the longest “weighted db time”

add the exec count, db time, count(*), rows after grouping, and weighteddb time for the set of X−1 dimensions with the longest weighted db timeto the statistics for the X dimension set

reduce X by 1

end recursionend for each row

This yields the table depicted in FIG. 10. Next, as shown in FIG. 11,the list is ordered by count of dimensions, and weighted db time; thesingle column indexes is deleted since by default there are alreadysingle-column indexes for each dimension of the fact table.

For each N, where N is the count of dimensions in the proposed index,the set of dimensions with the longest query database time, and thus thegreatest potential for improvement, would be at the top of the list. Onecould adjust the process to discard any proposed indexes with more thanN columns. If an index with N columns were chosen for creation, then anyindexes with fewer columns, where the smaller indexes match the leadingcolumns of the N column index, would not be creation candidates, asthese smaller indexes are contained in the N column index.

(B) Feedback to Evaluate Effectiveness of Indexes

After a multi-column index has been created on the fact table, the querystatistics for queries before and after the creation of the index can becompared, to determine if the index has improved performance, such asthat shown in FIG. 12. Summing the queries by dimension list, before andafter the addition of the index, yields the table depicted in FIG. 13.

Next, for each unique set of dimensions, the performance benefit of theindexes is calculated, by computing the ratio of count(*)/second afterand before for each set of characteristic combinations contained in theindexes:

D1 D2 D4->(7300/200)/(7300/830)=36.5/8.79=4.15

Three dimension queries execute 415% times faster than before.

D1 D2->(85000/580)/(90000/700)=146.55/128.57=1.13

Two dimension queries execute 13% faster than before.

If the index helped some queries, and hindered others, then the totalvaluation done below will calculate the total benefit of the index, todetermine if the index should be kept.

(C) Comparing Multi-Column Indexes, to Determine Which Provides MoreBenefit:

If there are several multi-column indexes that have been created, onecan evaluate which provides the largest overall system benefit. Havingcomputed the performance benefit from a new index for individualcharacteristic combinations, one can compare indexes in this way:

Index Performance Columns querycube benefit D1 D2 D4 Z 4.15 D1 D2 Z 1.13D1 D3 M 1.4 D1 D3 D4 M 1.7First, the saved time based on the query statistics is computed as shownin FIG. 14 wherein: savedtime=((db time*perf benefit)−db time). Next, asshown in FIG. 15, all the characteristic combinations that could besupported by each index are summed. If saved time is negative, then theindex has hindered performance, and would be deleted.

(D) Note on Symptom Based Evaluation

When a new index with high ‘weighted db time’ is proposed in step 1, theindex performance benefit can be calculated (Section B) to verify thatit helped performance, and then the index can be compared to existingindexes (Section C), to determine from among several indexes which indexis least valuable, in order to delete indexes which provide littlebenefit.

The valuation process (Section B) and comparison process (Section C) donot need to know whether the new index was used at execution time. Theyare symptom based, and not access path based. The feedback uses symptoms(that is a change in the rate of rows per second) to determine whetherthe index helped or not. It is not necessary to explain the statementand find the access path, to determine whether the change was helpful.In fact, if the new index was used, and caused performance to be worse,the feedback process and index valuation would recognize that so theindex could be deleted.

This method does not estimate the value of an index before it iscreated. It looks for index candidates, and evaluates them aftercreation, to learn what worked, and what did not work.

It is understood that the systems, functions, mechanisms, methods, andmodules described herein can be implemented in hardware, software, or acombination of hardware and software. They may be implemented by anytype of computer system or other apparatus adapted for carrying out themethods described herein. A typical combination of hardware and softwarecould be a general-purpose computer system with a computer program that,when loaded and executed, controls the computer system such that itcarries out the methods described herein. Alternatively, a specific usecomputer, containing specialized hardware for carrying out one or moreof the functional tasks of the invention could be utilized. The presentinvention can also be embedded in a computer program product, whichcomprises all the features enabling the implementation of the methodsand functions described herein, and which—when loaded in a computersystem—is able to carry out these methods and functions. Computerprogram, software program, program, program product, or software, in thepresent context mean any expression, in any language, code or notation,of a set of instructions intended to cause a system having aninformation processing capability to perform a particular functioneither directly or after either or both of the following: (a) conversionto another language, code or notation; and/or (b) reproduction in adifferent material form.

The foregoing description of the preferred embodiments of the inventionhas been presented for purposes of illustration and description. Theyare not intended to be exhaustive or to limit the invention to theprecise form disclosed, and obviously many modifications and variationsare possible in light of the above teachings. Such modifications andvariations that are apparent to a person skilled in the art are intendedto be included within the scope of this invention as defined by theaccompanying claims.

1. A query predicate analysis system comprising: a system for collectingperformance statistics for database queries, wherein the databasequeries include statements submitted by end-users having differentpredicate combinations; an aggregation system that organizes theperformance statistics based on the different predicate combinations ofthe database queries; and an evaluation system that analyzes performanceof a subset of the database queries having different predicatecombinations that include common predicates.
 2. The system of claim 1,further comprising a problem identification system that identifies afirst predicate which when added to an identified database query resultsin a substantial degradation of database query performance.
 3. Thesystem of claim 2, further comprising a solution proposal system thatautomatically proposes a solution to address the substantial degradationof database query performance.
 4. The system of claim 3, furthercomprising a feedback system for comparing the performance of theidentified database query before and after an implementation of aproposed solution.
 5. The system of claim 3, further comprising aregression testing system for evaluating performance of a second subsetof database queries referencing a changed predicate.
 6. The system ofclaim 1, wherein the performance statistics include: rows that satisfythe predicates, query elapsed time, index pages referenced, and rowsprocessed.
 7. A computer program product comprising a computer usablemedium having computer usable program code for analyzing queryperformance based on predicate usage, which when executed on a computercomprises: means for collecting performance statistics for databasequeries comprising database statements entered by end-users havingdifferent predicate combinations; means for organizing the performancestatistics based on the different predicate combinations of the databasequeries; means for analyzing the performances of a subset of thedatabase queries having different predicate combinations that includecommon predicates; and means for generating analysis results.
 8. Thecomputer program product of claim 7, wherein each predicate combinationcomprises a set of dimensions.
 9. The computer program product of claim7, further comprising means for generating a performance improvementmeasure for each of a second subset of the database queries based on anorder of the predicates.
 10. The computer program product of claim 7,further comprising means for proposing a solution to address a databasequery from the subset of database queries identified as a low performer.11. The computer program product of claim 10, further comprising meansfor comparing the performance of a database query from the subset ofdatabase queries before and after an implementation of a proposedsolution.
 12. The computer program product of claim 7, wherein theperformance statistics are selected from the group consisting of: rowsthat satisfy a set of predicates, query elapsed time, index pagesreferenced, and rows processed.
 13. A query predicate analysis systemcomprising: a system for collecting performance statistics for databasequeries, wherein the database queries include statements submitted byend-users having different predicate combinations; an aggregation systemthat organizes the performance statistics based on the differentpredicate combinations of the database queries; and an evaluation systemthat analyzes performance of a subset of the database queries havingdifferent predicate combinations that include common predicates, whereinthe evaluation system calculates an estimate improvement for adding apredicate given by: Estimated improvement=Time−(((count/second withadded predicate)/(count/second))*Time), where Time is a sum of time fordatabase queries containing an added predicate, count/second is a ratefor running a database query without the added predicate andcount/second with added predicate is a rate for running the databasequery with the added predicate.
 14. A query predicate analysis systemcomprising: a system for collecting performance statistics for databasequeries, wherein the database queries include statements submitted byend-users having different predicate combinations; an aggregation systemthat organizes the performance statistics based on the differentpredicate combinations of the database queries; an evaluation systemthat analyzes performance of a subset of the database queries havingdifferent predicate combinations that include common predicates; and aproblem identification system that identifies a first predicate whichwhen added to an identified database query results in a substantialdegradation of database query performance, wherein the problemidentification system selects a problem from a problem list including:missing statistics on an added column; missing index on a new column;missing multi-column index on a fact table; and the first predicatecontains a non-indexable operator.